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Description 

BACKGROUND OF THE INVENTION 

This invention relates to CPUs such as in minicom- 5 
puters or microcomputers, and particularly to a data 
processor suitable for use in high speed operation. 

Hitherto : various means have been devised for the 
high speed operation of computers. The typical one is 
a pipeline system. The pipeline system does not com- 
plete the processing of one instruction before the next 
instruction is started to execute, but makes the execu- 
tion of instructions in such a bucket-relay manner that 
when the execution of one instruction which is divided 
into a plurality of stages is going to enter into the second 
stage, the first stage of the next instruction which is sim- 
ilarly divided into a plurality of stages is started to exe- 
cute. This system is described in detail in the book "ON 
THE PARALLEL COMPUTER STRUCTURE", written 
by Shingi Tomita, published by Shokodo, pages 25 to 
68. By use of the n-stage pipeline system, it is possible 
to execute n instructions along all stages at a time and 
complete the processing of one instruction at each pipe- 
line pitch though one instruction is processed at each 
pipeline stage. 

EP-A-0 101 596 describes a data processor com- 
prising an instruction storage unit, a decode and distri- 
bution unit and two pipeline-controlled instruction units 
for reading out the instructions, decoding the type of in- 
struction, reading out registers required for the execu- 
tion of operation and controlling the execution by two 
operation units. The first operation unit is dedicated to 
add/substract instructions and logical operation instruc- 
tions. The second operation unit is dedicated to multiply 
instructions and logical operation instructions. Detec- 
tors are provided which detect an operand conflict be- 
tween a second instruction and a preceding instruction 
to be executed by the first or the second operation unit 
respectively. 

It is well known that the instruction architecture of a 
computer has a large effect on the processing method 
and the process performance. From the instruction ar- 
chitecture point of view, the computer can be grouped 
into the CISC (Complex Instruction Set Computer) and 
the RISC (Reduced Instruction Set Computer). The 
CISC processes complicated instructions by use of mi- 
croinstructions, while the RISC treats simple instruc- 
tions, and instead makes high speed computation using 
the hard wired logic control without use of microinstruc- 
tions. Now, we will describe the summary of the hard- 
ware and the pipeline operation of both the conventional 
CISC and RISC. 

Fig. 2 shows the general construction of the CISC- 
type computer. There are shown a memory interface 
200, a program counter (PC) 201, an instruction cache 
202, an instruction register 203, an instruction decoder 
204, an address calculation control circuit 205, a control 
storage (CS) 206 in which microinstructions are stored, 



2 

a microprogram counter (MPC) 207, a microinstruction 
register 208, a decoder 209, a register MDR (Memory 
Data Register) 210 which exchanges data with the 
memory, a register MAR (Memory Address Register) 
211 which indicates the operand address in the memory, 
an address adder 212, a register file 21 3, and an ALU 
(Arithmetic Logical Unit) 214. 

The operation of the construction will be mentioned 
briefly. The instruction indicated by the PC 201 is taken 
out by the instruction cache and supplied through a sig- 
nal 217 to the instruction register 203 where it is set. 
The instruction decoder 204 receives the instruction 
through a signal 218 and sets the head address of the 
microinstruction through a signal 220 in the micropro- 
gram counter 207. The address calculation control cir- 
cuit 205 is ordered through a signal 219 to process the 
way to calculate the address. The address calculation 
control circuit 205 reads the register necessary for the 
address calculation, and controls the address adder 
212. The contents of the register necessary for the ad- 
dress calculation is supplied from the register file 213 
through buses 226, 227 to the addres adder 21 2. On the 
other hand, the microinstruction is read from the CS 206 
at every machine cycles, and decoded by the decoder 
209 and used to control the ALU 214 and the register 
file 213. In this case, a control signal 224 is supplied 
thereto. The ALU 21 4 calculates data fed from the reg- 
ister through buses 228, 229, and again makes it be 
stored in the register file 213 through a bus 230. The 
memory interface 200 is the circuit for exchanging with 
the memory such as fetching of instructions and oper- 
ands. 

The pipeline operation of the computer shown in 
Fig. 2 will be described with reference to Figs. 3, 4 and 
5. The pipeline is formed of six stages. At the IF (Instruc- 
tion Fetch) stage, an instruction is read by the instruction 
cache 202 and set in the instruction register 203. At the 
D (Decode) stage, the instruction decoder 204 performs 
decoding of the instruction. At A (Address) stage, the 
address adder 212 carries out the calculation of the ad- 
dress of the operand. At the OF (Operand Fetch) stage, 
the operand of the address pointed by the MAR 211 is 
fetched through the memory interface 200 and set in the 
MDR 210. At the EX (Execution) stage, data is read by 
the register file 213 and the MDR 210, and fed to the 
ALU 214 where it is calculated. At the last W (Write) 
stage, the calculation result is stored through the bus 
230 in one register of the register file 213. 

Fig. 3 shows the continuous processing of add in- 
struction ADDs as one basic instruction. At each ma- 
chine cycle, one instruction is processed, and the ALU 
214 and address adder 212 operate in parallel. 

Fig. 4 shows the processing of the conditional 
branch instruction BRAcc. A flag is produced by the 
TEST instruction. Fig. 4 shows the flow at the time when 
the condition is met. Since the flag is produced at the 
EX stage, three-cycles waiting is necessary until the 
jumped-to-instruction is fetched, the more the number 
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of stages is increased, the waiting cycle count is in- 
creased the more, resulting in the neck in the perform- 
ance enhancement. 

Fig. 5 shows the execution flow of a complicated 
instruction. The instruction 1 is the complicated instruc- 
tion. The complicated instruction requires a great 
number of memory accesses as is the string copy and 
is normally processed by extending the EX stage many 
times. The EX stage is controlled by the microprogram. 
The microprogram is accessed to once per machine cy- 
cle. In other words, the complicated instruction is proc- 
essed by reading the microprogram a plurality of times. 
At this time, since one instruction is processed at the EX 
stage, the next instruction (the instruction 2 shown in 
Fig. 5) is required to wait. In such case, the ALU 214 
operates at all times, and the address adder 212 idles. 

The RISC-type computer will hereinafter be de- 
scribed. Fig. 6 shows the general construction of the 
RISC-type computer. There are shown a memory inter- 
face 601 , a program counter 602, an instruction cache 
603, a sequencer 604, an instruction register 605, a de- 
coder 606, a register file 607, an ALU 608, an MDR 609, 
and an MAR 6 10. 

Fig. 7 shows the process flow for the basic instruc- 
tions. At the IF (Instruction Fetch) stage, the instruction 
pointed by the program counter 602 is read by the in- 
struction cache and set in the instruction register 605. 
The sequencer 604 controls the program counter 602 in 
response to an instruction signal 615 and a flag signal 
61 6 from the ALU 608. At the R (Read) stage, the con- 
tents of the instruction pointer register is transferred 
through buses 618, 619 to the ALU 608. At the E (Exe- 
cution) stage, the ALU 608 makes arithmetic operation. 
Finally at the W (Write) stage, the calculated result is 
stored in the registor file 607 through a bus 620. 

In the RISC-type computer, the instruction is limited 
only to the basic instruction. The arithmetic operation is 
made only between the registors, and the instruction in- 
cluding operand fetch is limited to the load instruction 
and the store instruction. The complicated instruction 
can be realized by a combinatbn of basic instructions. 
Without use of the microinstruction, the contents of the 
instruction registor 605 is decoded directly by the de- 
coder 606 and used to control the ALU 608 and so on. 

Fig. 7 shows the process flow for registor-registor 
arithmetic operation. The pipeline is formed of four stag- 
es since the instruction is simple. 

Fig. 8 shows the process flow for the time of condi- 
tional branch. As compared with the CISC-type compu- 
ter, the number of pipeline stages is small, and thus the 
waiting cycle is only one cycle. In this case, in addition 
to the inter-register operation, it is necessary to load the 
operand from the memory and store the operand in the 
memory. In the CISC-type computer, the loading of the 
operand from the memory can be performed in one ma- 
chine cycle because of the presence of the address 
adder, while in the RISC-type computer shown in Fig. 6, 
the load instruction requires two machine cycles be- 



cause it is decomposed into the address calculation in- 
struction and the load instruction. 

The problems with the above-mentioned prior art 
will be described briefly. In the CISC-type computer, al- 
5 though the memory-register instruction can be executed 
in one machine cycle because of the presence of the 
address adder, the overhead at the time of branching is 
large because of large number of pipeline stages. More- 
over, only the E stage is repeated when a complicated 
to instruction is executed, and as a result the address 
adder idles. 

In the RISC-type computer, the overhead at the time 
of branching is smalt because of small number of pipe- 
line stages. However, for the memory-register operation 
is without use of the address adder, two instructions are 
required of the load instruction and the inter-register op- 
eration instruction. 

SUMMARY OF THE INVENTION 

20 

Accordingly, it is a first object of this invention to pro- 
vide a data processor capable of making effective use 
of a plurality of arithmetic operation units to enhance the 
processing ability. 

25 it is a second object of this invention to provide a 
data processor capable of reducing the overhead at the 
time of branching. 

It is a third object of this invention to provide a data 
processor capable of reducing the processing time for 

30 a complicated instruction for the memory-register oper- 
ation. 

The above objects can be achieved by providing a 
plurality of arithmetic operation units sharing the register 
file, simplifying the instructions to decrease the number 
3S of pipeline stages and reading a plurality of instructions 
in one machine cycle to control the plurality of arithmetic 
operation units. 

The invention is set out in claims 1 and 10. 
According to the preferred embodiments of this in- 
40 vent ion, the complex instruction is decomposed into ba- 
sic instructions: and a plurality of instructions are read 
at a time in one machine cycle and executed, so that the 
plurality of arithmetic operation units can be simultane- 
ously operated thereby to enhance the processing abil- 
45 ity. 

Moreover since the function of the instruction is 
simple, and since the number of pipeline stages can be 
decreased, the overhead at the lime of branching can 
be reduced. 

so Furthermore, since the plurality of arithmetic oper- 
ation units are operated in parallel, the processing time 
for the complicated instruction can be reduced. 

BRIEF DESCRIPTION OF THE DRAWINGS 

55 

Fig. 1 is a block diagram of the whole construction 
of one embodiment of this invention. 

Fig. 2 is a block digram of the whole construction of 
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a conventional example. 

Figs. 3 to 5 are liming charts for the operation there- 
of. 

Fig. 6 is a block diagram of the whole construction 
of another conventional example. 5 

Figs. 7 and 8 are timing charts for the operation 
thereof. 

Fig. 9 shows the list of instructions to be used in one 
embodiment of this invention. 

Fig. 10 shows the format of the instruction associ- io 
ated with the embodiment of this invention. 

Figs. 11 to 14 are timing charts for the operation of 
the embodiment of this invention. 

Fig. 1 5 is a timing chart for the operation of the con- 
ventional example. 15 

Figs. 16 to 18 are timing charts for the operation of 
the embodiment of this invention. 

Fig. 19 is a construction diagram of the first arith- 
metic operation unit 110 in Fig. 1. 

Fig. 20 is a construction diagram of the second 20 
arithmetric unit 112 in Fig. 1. 

Fig. 21 is a construction diagram of the register file 
111 in Fig. 1. 

Figs. 22 to 25 are diagrams useful for explaining the 
embodiment of this invention shown in Fig. 1 . 2s 

Fig. 26 is a construction diagram of the instruction 
unit 103 in Fig. 1. 

Fig. 27 is a diagram useful for explaining the oper- 
ation thereof. 

Fig. 28 is a construction diagram of the cache 2301 30 
in Fig. 26. 

Fig. 29 is another construction diagram of the in- 
struction unit 103 in Fig. 1. 

Fig. 30 is a timing chart for the operation of the em- 
bodiment ol this invention. 35 

Figs. 31 A and 31 B show the instruction formats. 

Fig. 32 is a block diagram of the whole construction 
of another embodiment of this invention. 

Figs. 33 to 36 are diagrams of other embodiments 
of this invention, which make simultaneous partial w 
processing of a plurality of instructions. 

DESCRIPTION OF THE PREFERRED 
EMBODIMENTS 

45 

One embodiment of this invention will be described. 

Fig. 9 is the list of instructions to be executed by the 
processor of this embodiment. The basic instructions 
are all executed by the inter-register operation. The 
branch instructions include four branch instructions: an so 
unconditional branch instruction BRA, a conditional 
branch instruction BRAcc (cc indicates the branch con- 
dition), a branch-to-subroutine instruction CALL, and a 
return -from-subroutine instruction RTN. In addition to 
these instructions, a load instruction LOAD and a store ss 
instruction STORE are provided, although, for conven- 
ience of explanation, the data format is only 32 bits 
whole number, it is not limited thereto. The address has 



32 bits (4 bytes) for each instruction. For the sake of 
simplicity, the number of instructions are limited as 
above, but may be increased as long as the contents 
can be processed in one machine cycle. 

Fig. 10 shows the instruction lormat. The instruc- 
tions all have a fixed length of 32 bits. The F, S1, S2, 
and D field of the basic instruction are, respectively, the 
bit or bits indicating whether the arithmetic operation re- 
sult should be reflected on the flag, the field for indicat- 
ing the first source register, the field for indicating the 
second source register, and the field for indicating the 
destination register. 

Fig. 1 shows the construction of this embodiment. 
There are shown a memory interface 100, a 32-bit pro- 
gram counter 101, a sequencer 102, an instruction unit 
103, a 32-bit first instruction register 104, a 32-bit sec- 
ond instruction register 105, a first decoder 106, a sec- 
ond decoder 1 07, an MDR 1 08, an MAR 1 09, a first arith- 
metic operation unit 1 1 0, a register file 111, and a second 
arithmetic operation unit 112. 

In this emodiment, two instructions are read and ex- 
ecuted in parallel in one machine cycle. Figs. 11 to 14 
show the pipeline processing in this embodiment. The 
pipeline comprises four stages of the IF (Instruction 
Fetch), R (Read), EX (Execution), W (Write). 

The operation of this embodiment will be described 
with reference to Fig. 1 . 

At the IF stage, two instructions pointed by the pro- 
gram counter are read, and set in the first and second 
instruction registers 104 and 105 through buses 11 5 and 
117, respectively. When the contents of the PC is even, 
the instruction at PC address is stored in the first instruc- 
tion register and the instruction at PC + 1 address is 
stored in the second instruction register. When the PC 
indicates odd, the NOP instruction is set in the first in- 
struction register and the instruction at PC address is 
set in the second instruction register. The sequencer 
102 is the circuit for controlling the program counter. 
When the first and second instruction registers both in- 
dicate no branch instruction, the program counter is in- 
cremented to the previous count + 2. At the time of 
branching, the branch address is computed and set in 
the program counter. When the conditional branch oc- 
curs, decision is made of whether the branch should be 
made or not on the basis of the flag information, 123 
from the first arithmetic operation unit and the flag infor- 
mation, 124 from the second arithmetic operation unit. 
The signal 1 16 fed from the instruction unit is the conflict 
signal indicative of various different conflicts between 
the first and second instructions. When the conflict sig- 
nal is asserted, the conflict is controlled to be avoided 
by the hardware. The method of avoiding conflicts will 
be described in detail later. 

The operation of the R stage at the time of process- 
ing the basic instruction will be mentioned below. At the 
R stage, the contents of the first instruction register 104 
is decoded by the first decoder 1 06, and the contents of 
the second instruction register 105 is decoded by the 
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second decoder 1 07. As a result, the contents of the reg- 
ister pointed by the first source register field S1 of the 
first instruction register 104 is fed to the first arithmetic 
operation unit 110 through a bus 125, and the contents 
of the register pointed by the second source register 5 
field S2 is fed through a bus 126 thereto. Moreover, the 
contents of the register pointed by the first source reg- 
ister field SI of the second instruction register is fed 
through a bus 127 to the second arithmetic operation 
unit 112. and the contents of the register pointed by the 
second source register field S2 is fed through a bus 128 
thereto. 

The operation of the EX stage will hereinafter be 
described. At the EX stage, the first arithmetic operation 
unit 110 makes an arithmetic operation for the data fed 
through the buses 125 and 126 in accordance with the 
contents of the OP code of the first instruction register. 
At the same time, the second arithmetic operation unit 
112 makes an arithmetic operation for the data fed 
through the buses 127 and 128 in accordance with the 
contents of the OP code of the second instruction reg- 
ister 105. 

Finally, the operation of the W stage will be men- 
tioned below. At the W stage, the result of the arithmetic 
operation of the first arithmetic operation unit 110 is 
stored through a bus 129 in the register pointed by the 
destination field D of the first instruction register Also ; 
the result of the arithmetic operation of the second op- 
eration unit 112 is stored through a bus 131 in the reg- 
ister pointed by the destination field D of the second in- 
struction register. 

Fig. 11 shows the flow chart for the continuous 
processing of basic instructions. Two instructions are 
processed at a time in one machine cycle, in this exam- 
ple, the first arithmetic operation unit and the second 
arithmetic operation unit are always operated in parallel. 

Fig. 1 2 is the flow chart for the continuous process- 
ing either load or store instruction as a first instruction, 
and the basic instruction as a second instruction. When 
the load instruction is executed, at the R stage the con- 
tents of the register specified by the S2 field of the first 
instruction register is transferred through the bus 126 to 
the MAR 109. At the EX stage, the operand is fetched 
through the memory interface 100. Finally, the operand 
fetched at the W stage is stored through the bus 129 in 
the register specified by the destination field D of the 
first instruction register. 

At the EX stage, the operand can be fetched in one 
machine cycle if a high speed cache is provided in the 
memory interface. Particularly, it can be easily made if 
the whole computer shown in Fig. 1 is integrated in a 
semiconductor substrate with the instruction cache and 
data cache provided on the chip. Of course, when the 
cache is mis-hitted, the operand fetch cannot be finished 
in one machine cycle. In such case, the system clock is 
stopped, and the EX stage is extended. This operation 
is also performed in the conventional computer. 

When the store instruction is executed, at the R 



state the contents of the register pointed by the first 
source register field S1 of the first instruction register is 
transferred as data through the bus 125 to the MDR 108. 
At the same time, the contents of the register pointed by 
the second source register field S2 of the first instruction 
register is transferred as address through the bus 126 
to the MAR 109. At the EX stage, the data within the 
MDR 108 is written in the address pointed by the MAR 
109. 

As shown in Fig. 12, even if the load instruction or 
the store instruction is the first instruction, two instruc- 
tions can be processed at a lime in one machine cycle. 
The case where the load instruction or the store instruc- 
tion appears as the second instruction will be mentioned 
in detail later. 

Fig. 1 3 shows the process flow for the execution of 
the unconditional jump BRA instruction as the second 
instruction. When the BRA instruction is read, at the R 
stage the sequencer 102 makes the addition between 
the displacement field d and the program counter, and 
set it in the program counter 101. During this time, the 
instruction next to the address of the BRA instruction 
and the further next instruction are read (the instructions 
1 and 2 shown in Fig. 13). In the next cycle, two instruc- 
tions at the addresses to which the program has been 
jumped are read. In this embodiment, the hardware is 
able to execute the instructions 1 and 2. In other words, 
no waiting cycle occurs even at the time of processing 
the jump instruction, this approach is called the delay 
branch, and used in the conventional RISC-type com- 
puter. However, in the conventional RISC-type compu- 
ter only one instruction can be executed during the com- 
putation of the address of the jump instruction. In this 
embodiment, two instructions can be executed at a time 
during the computation of the address of the jump in- 
struction, thus having a higher processing ability. The 
same is true for the processing flow of the CALL instruc- 
tion and the RTN instruction. The compiler produces the 
codes so that as effective instructions as possible can 
be executed during the computation of the address of 
the branch instruction, but when there is nothing to do : 
the instructions 1 and 2 shown in Fig. 1 3 are made NOP 
instructions. At this time, substantially one machine cy- 
cle waiting occurs. However since the number of pipe- 
line stages is small, the overhead at the time of branch- 
ing can be reduced as compared with the CISC-type 
computer mentioned in the conventional example. 

Fig. 14 shows the processing flow of the conditional 
branch instruction BRAcc. The flag is set by the instruc- 
tion indicated by ADD, F, and the decision of whether 
the branch condition is met or not is made according to 
the result. At this time, similarly as at the time of the un- 
conditional branch instruction processing mentioned 
with reference to Fig. 1 3, the instruction next to the ad- 
dress of the BRAcc instruction, the instruction 1 in Fig. 
14 : the next instruction, and the instruction 2 in Fig. 14 
are read and processed. However, at the W stage during 
the processing flow of the two instructions, the result of 
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the arithmetic operation is weritten in the register file on- 
ly when the branch condition of the BRAcc instruction is 
not satisfied. In other words, when the branch instruction 
is satisfied, the result of the computation is suppressed 
from being written. 

Thus, as shown in Figs. 11 to 14. this embodiment 
processes two instructions at a time during one machine 
cycle, thus having the merit that the processing ability 
is enhanced to the double, maximum. Moreover, since 
simple instructions are used and the number of pipeline 
stages is as small as 4 under the control of the wired 
logic, the overhead at the time of branching can be re- 
duced to one machine cycle, maximum. In addition, if 
the delay branch is optimized by the compiler, the over- 
head can be eliminated. 

Moreover, since even complicated processings can 
be executed by a combination of simple instructions, the 
parallel operations of the first arithmetic operation unit 
110 and the second arithmetic operation unit 112 in Fig. 
1 can be performed with less idling as compared with 
that of the address adder and ALU by the parallel pipe- 
line in the conventional CISC-type computer This as- 
pect will be mentioned a little more. When the load from 
the memory to the register is repeated, the conventional 
CISC-type computer, as shown in Fig. 1 5, is able to load 
a piece of data at a time during one machine cycle. On 
the contrary, this embodiment takes two instructions of 
the address computation ADD instruction and the LOAD 
instruction using the address for loading a piece of data, 
but is able to execute two instructions at a time during 
one machine cycle as shown in Fig. 16, thus still being 
able to load one piece of data at a time during one ma- 
chine cycle. From the viewpoint of the parallel operation 
of arithmetic operation units, both operate two arithmetic 
operation units in parallel and thus are the same. 

Figs. 17 and 18 show the comparison of lurther 
complicated processings. The instruction 1 which, as 
shown in Fig. 17, takes 6-cycles processing at the EX 
stage in the conventional CISC-type computer can be 
executed in 3 cycles in this embodiment as shown in 
Fig. 18. This is because in the conventional CISC-type 
computer, the operation of the address adder is stopped 
during the execution of the instruction 1, while in this 
embodiment, two arithmetic operation units can be op- 
erated in parallel in each cycle. 

Fig. 19 shows the construction of the first arithmetic 
operation unit 110 shown in Fig. 1 . There are shown an 
ALU 1500, a barrel shifter 1501, and a flag generation 
circuit 1 502. The data transferred through the buses 1 25 
and 1 26 are processed by the ALU 1 500 for the addition, 
subtraction, and logic operation and by the barrel shifter 
for the SFT instruction. The result of the processing is 
transmitted to the bus 130. A flag is produced from the 
flag generation circuit 1502 on the result of the arithme- 
tic operation and fed as the signal 123. 

Fig. 20 shows one example of the construction of 
the second arithmetic operation unit 112 in Fig. 1. There 
are shown an ALU 1600, and a flag generation circuit 



1601. The second arithmetic operation unit is different 
from the first arithmetic operation unit in that it has no 
barrel shifter. This is because the SFT instruction occurs 
less frequently than the arithmetic logic oepration in- 

5 struction. Thus, two SFT instructions cannot be execut- 
ed in one machine cycle, but there is the merit that the 
amount of hardware can be reduced. The control meth- 
od to be used when two SFT instructions appear will be 
described later. Of course, the second arithmetic unit 

10 112 may be the unit shown in Fig. 1 9. 

Fig. 21 shows the construction of the register file 
111 in Fig. 1. There are shown registers 1708, and bus 
switches 1 700 to 1 709. Each register has four read ports 
and two write ports. The bus switch is used to bypass 

is the register file when the register specified by the des- 
tination field of the previous instruction is immediately 
used for the next instruction. For example, the bus 
switch 1702 is the bypass switch from the bus 129 to 
the bus 127, and opens when the destination register 

20 field D of the first instruction coincides with the first 
source register field S1 of the second instruction. 

The method of eliminating the conflict between the 
first and second instructions will be described with ref- 
erence to figs. 22 to 29. Both instructions cannot some- 

25 times be executed at a time depending on a combination 
of the first and second instructions. This is called the 
conflict. The conflict occurs in the following cases. 

(1 ) Load or store instruction appears as the second 
30 instruction. 

(2) SFT instruction appears as the second instruc- 
tion. 

(3) The register pointed by the destination register 
field D of the first instruction coincides with the reg- 

35 ister specified by the first source register field S1 o1 
the second instruction or with the register pointed 
by the second source register field S2 of the second 
instruction. 

40 The above cases (1) and (2) in which the conflict 
occurs are the problems peculiar to this embodiment 
which are caused when the load, store instruction and 
the SFT instruction cannot be processed by the second 
arithmetic operation unit. If in Fig. 1 the second MDR is 

45 added to the bus 1 27, the second MAR is added to the 
bus 1 28, and two pieces of data are accessed to in one 
machine cycle through the memory interface, then the 
conflict condition (1) can be eliminated. Moreover, if the 
barrel shifter is provided in the second arithmetic oper- 

50 ation unit, the conflict condition (2) can be elimianted. 
In this embodiment , the conflict condition occurs be- 
cause of hardware reduction. In such case, since the 
conflict can be easily eliminated as described later, only 
the hardware associated with the instructions to be ex- 

55 ecuted at a time is doubled in accordance with the nec- 
essary performance and the allowable amount of hard- 
ware and thus the hardware is reduced with substantial- 
ly no reduction of performance. 
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The control method to be used when the SFT in- 
struction appears as the second instruction will be men- 
tioned with reference to Fig. 22. 

The upper part of Fig. 22 shows the case where the 
SFT instruction is located in the address "3" for the sec- 
ond instruction. The lower part of Fig. 22 shows the in- 
structions to be stored in the first and second instruction 
registers at the time of excut ion. When the program 
counter is 2, the hardware detects that the second in- 
struction is the SFT instruction, and the instruction at 
the address 2 is set in the first instruction register, the 
NOP instruction being set in the second instruction reg- 
ister. In the next machine cycle, the program counter is 
incremented by "1 or address 3 is set in the program 
counter. Moreover, the SFT instruction at the address 3 
is set in the first instruction register, and the NOP in- 
struction in the second instruction register. Thus, the 
processing can be correctly made in two separate ma- 
chine cycles. Of course, optimization is made by the 
compiler so that if possible, the SFT instruction is pref- 
erably prevented from appearing. 

Another method of eliminating the conflict will be de- 
scribed with reference to Fig. 23. The SFT instruction is 
prevented Irom being stored in the odd address for the 
second instruction, and when there is no instruction to 
be executed, the NOP instruction is stored therein. 
Thus, the program size is slightly increased, but the 
hardware for the elimination of the conflict can be omit- 
ted. 

Fig. 24 shows the processing method to be used 
when the load instruction appears as the second instruc- 
tion. The load instruction is stored in the address 3. The 
processing method is the same as for the SFT instruc- 
tion. 

Fig. 25 shows the processing method to be used 
when the register conflict occurs. The instruction at the 
address 2 is stored in the number-8 register, and the 
instruction at the address 3 reads the same number-8 
register. In this case, it is executed in two separate ma- 
chine cycles as is the SFT instruction. 

As to the load, store instruction and register conflict, 
too, it can be inhibited from being stored in the odd ad- 
dresses for the purpose of eliminating the conflict. The 
effect is the same as described for the SFT instruction. 

A description will be made of the hardware system 
for realizing the processing system mentioned with ref- 
erence to Figs. 22 to 25. Fig. 26 shows the construction 
of the instruction unit 103 in Fig. 1. There are shown a 
conflict detection circuit 2300, a cache memory 2301 , a 
first mask circuit 2302, and a second mask circuit 2303. 
The contents ol the program counter is, normally, input- 
ted through the bus 113, and the instruction pointed by 
the program counter and the instruction at the next ad- 
dress are fed to buses 2305 and 2306. At the time ot 
cache mis-hitting, the instruction is fetched through the 
memory interface 100, and written through the bus 113 
in the cache 2301 , At this time, the conflict detection cir- 
cuit checks if the conflict is present between the first and 



second instructions. If the conflict is present, the conflict 
signal 2304 is asserted. In the cache are provided bits 
each indicating the conflict condition of two instructions. 
At the time of cache mis-hitting, the conflict signal 2304 

s is stored therein. The first mask circuit receives the first 
instruction, the second instruction, the conflict bit, and 
the least significant bit of the program counter, and con- 
trols the signal 1 1 5 to the first instruction register 1 04 as 
shown in Fig. 27. The second mask circuit receives the 

10 second instruction, the conflict bit and the least signifi- 
cant bit of the program counter, and still supplies the 
signal 117 to the second register 105 as shown in Fig. 
27. 

When as shown in Fig. 27 the conflict bit and the 

'5 least significant bit of the PC are both 0, the first instruc- 
tion is fed to the first instruction register, and the second 
instruction to the second instruction register. This oper- 
ation is in the normal case. When the conflict bit is 1, 
and the least significant bit of the PC is 0, the first in- 

20 struction is fed to the first instruction register, and the 
NOP instruction to the second instruction register. This 
operation is the processing in the first machine cycle at 
the time of processing the conflict instruction. When the 
conflict bit is 1, and the least significant bit of the PC is 

25 1 ( the second instruction is fed to the first instruction reg- 
ister, and the NOP instruction to the second instruction 
register. This operation is the processing in the second 
machine cycle at the time of processing the conflict in- 
struction. Thus, the process flow of the conflict instruc- 

30 tion mentioned with reference to Figs. 22, 23, and 25 
can be realized by the processing. 

When the b ranch instruction is branched into an odd 
address, as shown in Fig. 27 only the second instruction 
is made effective irrespective of the conflict bit and thus 

3$ correct processing is possible. The cache is read in each 
cycle, but it is written when the cache is mis-hitted : in 
which case it is made over several machine cycles. 
Thus, if the conflict detection circuit is operated at the 
time of writing the cache so that the conflict bit is ketp 

40 in the cache, the machine cycle can be effectively short- 
ed. 

Fig. 28 shows the construction of the instruction 
cache 2301 in Fig. 26. There are shown a directory 
2500, a data memory 2501 , a selector 2502 : an address 

4S register 2503, a write register 2504, a comparator 2505, 
and a cache control circuit 2506. The cache in Fig. 28 
has substantially the same construction as the normal 
cache, but it is different in that the data memory 2501 
has provided therein a conflict bit holding field for each 

so 2-inst ruction 8 bytes, and that at the time of reading the 
cache, the least significant bit (0 bit) of the PC is ne- 
glected so that the first instruction 2305, the second in- 
struction 2306 and the conflict signal 116 are fed. 
In Fig. 28, the data memory is of 8 K words, and the 

55 block size is 32 bytes (8 words). The signal 1 1 3 fed from 
the program counter is set in the address register 2503. 
The outputs of the directory 2500 and data memory 
2501 are indicated by 3 to 12 bits of the address. The 
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comparator 2505 compares the output of the directory 
and the bits 1 3 to 31 of the address register. If the result 
of the comparison is not coincident, a signal 2508 is sup- 
plied to the cache control circuit 2506. The cache control 
circuit 2506 reads a block including an instruction mis- 5 
hit from the main memory, and sets it in the data memory 
2501. The selector 2502 receives the first and second 
bits of the address register, and selects two necessary 
instructions from the block. The first and second instruc- 
tions are sure to be within the same block, and only one w 
of them is never mis-hitted. 

Fig. 29 shows another construction of the instruc- 
tion unit 103 in Fig. 1 . There are shown a cache memory 
2600, a conflict detection circuit 2604, a first mask circuit 
2302, and a second mask circuit 2303. The construction '5 
shown in Fig. 29 is different from that shown in Fig. 26 
in that the cache has no field for holding the conflict bit 
and that the first instruction 2601 and the second in- 
struction 2602 of the cache output are monitored by the 
cycle conflict detection circuit 2604. The operations of 20 
the first mask circuit 2302 and the second mask circuit 
2303 is the same as those in Fig. 26. According to this 
embodiment, since the each -cycle conflict detection cir- 
cuit is operated after reading the cache, the machine 
cycle is extent ed, but the conflit bit field may be absent 25 
within the cache. 

Moreover, according to this invention, by making ef- 
fective use of the fact that two instructions are proc- 
essed at a time in one machine cycle, it is possible to 
more fast process the conditional branch instruction in 30 
a special case. That is, when in the conditional branch 
instruction, the destination of the branching when the 
condition is satisfied is the next, and next instruction (in- 
struction 2 in Fig. 30), the instructions 2 and 3 are exe- 
cuted irrespective of whether the condition is satisfied 3$ 
or not, and whether the W stage of the instruction 1 is 
suppressed or not is controlled by the satisfaction or not 
of the condition, so that when the condition is met, the 
waiting cyle can be eliminated. In this case, however, 
the conditional branch instruction is sure to be provided *o 
on the first instruction side. In the normal conditional 
branching, one waiting cycle occurs when the condition 
is satisfied, as described with reference to Fig. 14. In 
other words, since in this invention, two instructions are 
processed in one machine cycle at a time, the execution *s 
of instructions on the second instruction side can be 
controlled by whether the condition of the conditional 
branch instruction on the first instruction side is satisfied 
or not, without no effect on the instruction process flow 
of two- instruct ion units. so 

Moreover, in this embodiment, by making effective 
use of the processing of two instructions in one machine 
cycle at a time, it is possible to realize the "atomic* 
processing with ease. The atomic processing is the 
processing which is always made in a sequence, and 55 
which is used for the synchronization between process- 
es. Fig. 31 A shows the processing in the conventional 
computer, and Fig. 31 B shows that in this embodiment. 
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In Fig. 31 A, there is a possibility that an interruption en- 
ters between the instructions, while in Fig. 31 B no inter- 
ruption occurs between the instructions 1 and 2, and be- 
tween the instructions 3 and 4. Thus, in Fig. 31 A a pro- 
gram for other processes may enter between arbitrary 
instructions, while in Fig. 31 B there is the merit that the 
instructions 1 and 2 or the instructions 3 and 4 are sure 
to be executed in a sequence. 

Fig. 32 shows the construction of another embodi- 
ment of this invention. In this embodiment, 4 instructions 
can be processed in one machine cycle at a time. There 
are shown a memory interface 3200, a program counter 
3201, a sequencer 3202, an instruction unit 3203, first 
to fourth instruction registers 3204 to 3207, first to fourth 
decoders 3208 to 3211, an MDR 3212, an MAR 3213, 
first to fourth arithmetic operation units 3214, 3215, 
321 7 and 3218, and a register file 3216. Each arithmetic 
operation unit shares the register file 3216. The opera- 
tion of each portion is the same as in the embodiment 
shown in Fig. 1 , and thus will not be described. 

Simialrly, the degree of parallel can be further in- 
creased, but since there is a program in which one 
branch instruction is present in each several inductions, 
extrme increase of the degree of parallel in such pro- 
gram will not be much effective. It is preferable to proc- 
ess about 2 to 4 instructions at a time. If the degree of 
parallel is further increased in the program with a few 
branches and a few conflicts, the performance is effec- 
tively increased. Moreover, if the degree of parallel is 
selected to be 2 n (n is a natural number), the instruction 
unit can easily be controlled. 

Still another embodiment of this invention will be 
mentioned. In the above embodiments described so far, 
a plurality of instructions are always processed at a time. 
It is also possible to obtain a profit by normally process- 
ing one instruction in one machine cycle and at some 
case, processing a plurality of instructions at a time. Fig. 
33 shows three examples. In the example of Fig. 33a, 
the first instruction is stored in a main memory, and the 
second instruction is stored only on the head portion ol 
the address space and stored in an ROM. In the exam- 
ple of Fig. 33b, the first and second instructions are 
stored in the head portion of the address space and 
stored in an ROM, and in the other portions of the main 
memory is stored only the first instruction. In the exam- 
ple of Fig. 33c which is substantially the same as that ol 
Fig. 33a, the second instruction to be stored in an ROM 
is written in the intermediate portion of the address 
space. The whole construction of the computer is the 
same as in Fig. 1, and only the instruction unit 103 is 
required to be changed. In the ROM portion, is written 
a program with a high frequency of usage and with a 
high degree of parallel, which program is executed by a 
subroutine call from a routine. Since the ROM portion 
may be of a low capacity, a most suitable program can 
be produced by an assembler even without any compil- 
er. 

Fig. 34 shows the construction of the instruction unit 
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103 in Fig. 1 which construction is lor realizing the ex- 
ample ot Fig. 33a. There are shown a cache 2900, a 4 
K words ROM 2901, a mask circuit 2903, and a mask 
circuit control circuit 2902. The mask circuit control cir- 
cuit always monitors the address 113. Only when the 
more significant bits 1 2 to 31 ot the address are all zero, 
an effective signal 2904 is asserted. The mask circuit 
2903, only when the effective signal 2904 is asserted, 
supplies a ROM output 2905 to the second register as 
an output 117. In the other time, the NOP instruction is 
fed. 

In order to realize the example of Fig. 33c, the mask 
circuit control circuit 2902 shown in Fig. 34 is required 
to be constructed as shown in Fig. 35. There are shown 
a comparator 3000, and a base register 3001 . When the 
more significant bits 12 to 31 of the base register are 
coincident with the more significant bits 12 to 31 of the 
address 11 3, the comparator 3000 asserts the effective 
signal 2904. 

In order to realize the example of Fig. 33b, the in- 
struction unit 103 shown in Fig. 1 is required to be con- 
structed as shown in Fig. 36. The functions of the ROM 
2901 , mask circuit control circuit 2902, and mask circuit 
2903 are the same as those represented by the same 
numbers in Fig. 29. In Fig. 36 : there are shown a cache 
3100, a 4 K word ROM 3101, a selector control circuit 
3102 S and a selector 3107. The selector control circuit 
3102 always monitors the more significant bits 1 2 to 31 
of the address 1 1 3. Only when all the bits are 0, an ROM 
selection signal 3105 is asserted. The selector 31 07, on- 
ly when the ROM selection circuit 31 05 is asserted, sup- 
plies an ROM output signal 3104 to the first instruction 
register as the output 115. In the other time, the cache 
output 3103 is supplied. 

As described with reference with Figs. 33 to 36, the 
hardware can be reduced by simultaneously processing 
a plurality of instructions for some portion, and forming 
that portion as an ROM. Also, since only for the ROM 
portion, most suitable design can be achieved by as- 
sembler, there is the merit that it is not necessary to de- 
velop the compiler considering the simultaneous 
processing of a plurality of instructions. Moreover, by re- 
writing the ROM portion, it is possible to realize the high 
speed operation for each application and suitable for 
each application. 

According to this invention, since a complicated in- 
struction is decomposed into basic instructions, and a 
plurality of instructions are read and executed at a time 
in one machine cycle, a plurality of arithmetic operation 
units can be operated at a time, and thus increase the 
processing ability. 

Moreover, since the instructions have simple func- 
tions, and thus the number of pipeline stages can be 
decreased, the overhead upon branching can be made 
small. 

Furthermore, since a plurality of arithmetic opera- 
tion units are operated in parallel, the processing time 
for a complicated process can be decreased. 



Claims 

1. Data processor comprising 

5 an instruction unit (103) receiving instructions 

trom a memory interface (100), 

a sequencer (102) for controlling a program 

counter (101) in order to read instructions from 

the instruction unit (103), 
70 - at least two instruction registers (104, 105) for 

storing first and second instructions being read 

out from the instruction unit (103), 

and 

at least two arithmetic operation units (110, 
*s 112>for carrying out parallel arithmetic opera- 

tions in accordance with instructions read from 
the instruction registers (104, 105), 

the instruction unit (1 03) being provided with detec- 
20 Won means (2300) for generating information used 
for determining whether the instructions can be 
processed in parallel, with a cache memory (2301) 
for storing the instructions and the information gen- 
erated by the detection means (2300) and with 
25 means (2302, 2303) being responsive to the infor- 
mation output from the cache memory (2301 ) along 
with the instructions for controlling the sequence of 
execution by the arithmetic operation units (110, 
112). 

30 

2. Data processor according to claim 1 , 

further comprising detection means (2300) provid- 
ed between the memory interface (100) and the 
cache memory (2301 ). 

3S 

3. Data processor according to claim 1 or 2, 
further comprising that 

the cache memory (2301 ) outputs at least two in- 
structions in the same machine cycle, wherein, 
40 when the detection means (2300) detects a conflict, 
one instruction is executed as a NOP instruction. 

4. Data processor according to at least one of claims 
1 to 

45 3, further comprising that 

the detection means (2300) is constructed to ac- 
cess only once to a memory in one machine cycle 
and detects said conflict by deciding whether each 
of one instruction and a subsequent instruction is a 

^0 (oad instruction or a store instruction. 

5. Data processor according to at least one of claims 
1 to 

4, further comprising that 
55 the detection means (2300) detects the conflict on 
the basis of the fact that some of the at least one 
arithmetic operation units (110, 112) have a barrel 
shifter (1501), that at least one of said one instruc- 
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tion and said subsequent instruction is a bit shift in- 
struction, and that the arithmetic operation unit 
(110, 112) tor processing said bit shift instruction 
has no barrel shifter (1501). 

6. Data processor according to at least one of claims 
1 to 

5, further comprising that 

the detection means (2300) includes means for writ- 
ing a conflict bit indicating said conflict at every m 
instructions which are read at a time, within the 
cache memory (2301). 

7. Data processor according to claim 6, 
further comprising that 

an instruction for altering a status flag and a branch 
instruction with a satisfied branch condition is in- 
cluded in the m instructions read by the program 
counter (101 ) and means are provided for detecting 
the presence of a conditional branch instruction at 
a predetermined address to suppress execution of 
instructions after a corresponding address of the m 
instructions when the condition is satisfied. 

8. Data processor according to at least one of claims 
1 to 

7, further comprising that 

a part (2401) of the memory in which instructions 
are stored is formed of a ROM. 

9. Data processor according to at least one of claims 
1 to 

8, further comprising that 

any interrupt is inhibited between the m instructions 
which are read at a time. 

1 0. Method for processing data, comprising the steps of 

receiving instructions from a memory interface 
(100) by an instruction unit (103), 
controlling a program counter (101) by a se- 
quencer (102) in order to read instructions from 
the instruction unit (103), 
storing first and second instructions being read 
out from the instruction unit (103) in at least two 
instruction registers (104, 105), 
and 

carrying out parallel arithmetic operations in ac- 
cordance with instructions read from the in- 
struction registers (104 : 105) by at least two 
arithmetic operation units (110, 112), 
generating information by detection means 
(2300) used for determining whether the in- 
structions can be processed in parallel, 
storing the instructions and the information 
generated by the detection means (2300) in a 
cache memory (2301 ), 

controlling the sequence of execution of the in- 
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structions based on the information output from 
the cache memory (2301) and 
parallel processing the instructions, when the 
information output along with the instructions 
indicates that the instructions can be processed 
in parallel without conflict. 

11. Method according to claim 10, 
further comprising that 

to the detection means (2300) is provided between the 
memory interface (100) and the cache memory 
(2301). 

12. Method according to claim 1 0 or 11 , 
/5 further comprising that 

the cache memory (2301) outputs at least two in- 
structions in the same machine cycle, wherein, 
when the detection means (2300) detect a conflict, 
one instruction is executed as a NOP instruction. 

20 

Paten tanspruche 

1. Datenprozessor, der umfaBt: 

2S 

eine Befehlseinheit (103)- zum Empfangen von 
Befehlen von einer Speicherschnittstelle (100), 
einen Sequenzierer (102) zum Steuern eines 
Programmzahlers (101), urn Befehle von der 

30 Befehlseinheit (103) zu lesen, 

wenigstens zwei Befehlsregister (104, 105) 
zum Abspeichern eines ersten und eines zwei- 
ten Befehls, die von der Befehlseinheit (103) 
ausgelesen wurden, und 

35 - wenigstens zwei Arithmetikeinheiten (110,112) 
fur die Ausfuhrung paratleler arithmetischer 
Operationen in Ubereinstimmung mit Befehlen, 
die von den Befehlsregistern (104, 105) ausge- 
lesen wurden, 

40 

wobei die Befehlseinheit (103) ausgestattet ist mit 
einer Erfassungsvorrichtung (2300) zum Erzeugen 
von Information, die verwendet wird, urn festzule- 
gen, ob die Befehle parallel abgearbeitet werden 

45 konnen, mit einem Cache-Speicher (2301 ) zum Ab- 
speichern von Befehlen und der Information, die 
durch die Erfassungsvorrichtung (2300) erzeugt 
wurde, und mit Vorrichtungen (2302, 2303), die auf 
die Informationsausgabe von dem Cache-Speicher 

50 (2301) zusammen mit Befehlen zum Steuern der 
Abfolge von der Bearbeitung durch die Arithmetik- 
einheiten (110, 112) ausgegeben wurden. 

2. Datenprozessor nach Anspruch 1 , 

55 der auBerdem eine Erfassungsvorrichtung (2300) 
umfaBt, die zwischen Speicherschnittstelle (100) 
und Cache-Speicher (2301) angeordnet ist. 



EP 0 368 332 B1 



10 



19 



EP 0 368 332 B1 



20 



3. Datenprozessor nach Anspruch 1 oder 2, 

der auBerdem beinhaltet, daB der Cache-Speicher 
(2301 ) wenigstens zwei Befehle bei demselben Ma- 
schinenzyklus ausgibt, wobei fur den Fall, daB die 
Erfassungsvorrichtung (2300) einen Konflikt erfaBt, 
ein Befehl als NOP-Befehl ausgefuhrt wird. 

4. Datenprozessor nach einem der Anspruche 1 bis 3, 
der auBerdem beinhallet, daB die Erfassungsvor- 
richtung (2300) soaufgebaut ist, daB nureinmal auf 
einen Speicher in einem Maschinenzyklus zugegrif- 
fen wird und der Konflikt erkannt wird. indem ent- 
schieden wird, ob jeder einzelne Befehl und ein fol- 
gender Befehl ein Ladebefeht oder ein Speicherbe- 
fehl ist. 

5. Datenprozessor nach einem der Anspruche 1 bis 4, 
der auBerdem beinhaltet, daB die Erfassungsvor- 
richtung (2300) den Konflikt aufgrund der Tatsache 
erkennt, daB einige der wenigstens einen Arithme- 
tikeinheit (1 1 0, 1 1 2) ein Schieberegiste r (1 501 ) hat, 
daB wenigstens einer von einem Befehl und dem 
nachfolgenden Befehl ein Bitschiebebefehl ist und 
daB die Arithmetikeinheit (110, 112) zum Verarbei- 
ten des Bitschiebebefehls kein Schieberegister 
(1501) hat. 

6. Datenprozessor nach einem der Anspruche 1 bis 5, 
der auBerdem beinhaltet, daB die Erfassungsvor- 
richtung (2300) eine Vorrichtung zum Schreiben ei- 
nes Konfliktbits umfaBt, das den Konflikt bei alien 
m Befehlen anzeigt, die zur gleichen Zeit innerhalb 
des Cache-Speichers (2301) gelesen wurden. 

7. Datenprozessor nach Anspruch 6, 

der auBerdem beinhaltet, daB ein Befehl zur Ande- 
rung eines Status-Flags und eines Verzweigungs- 
befehis mit befriedigter Verzweigungsbedingung in 
den m Befehlen beinhaltet ist, die durch den Pro- 
grammzahler (101) gelesen wurden, und daB Vor- 
richtungen vorgesehen sind zum Erfassen der An- 
wesenheit eines bedingten Sprungbefehls bei einer 
vorgegebenen Adresse : urn die Ausfuhrung von 
Befehlen nach einer entsprechenden Adresse der 
m Befehle zu unterdrucken, wenn die Bedtngung er- 
f OIU ist. 

8. Datenprozessor nach einem der Anspruche 1 bis 7, 
der auBerdem beinhaltet, daB ein Teil (2401) des 
Speichers, in welchem die Befehle abgelegt sind, 
ein ROM ist. 

9. Datenprozessor nach einem der Anspruche 1 bis 8, 
der auBerdem beinhaltet, daB jeder Interrupt verbo- 
ten ist zwischen den m Befehlen, die zu einer Zeit 
gelesen wurden. 

10. Verfahren zum Verarbeilen von Daten, das die 



Schritte umfaBt: 

Empfangen von Befehlen von einer Speicher- 

schnittstelle (100) durch eine Befehlseinheit 
5 (103), 

Steuern eines Programmzahlers (101) durch 

einen Sequenzierer (102), um Befehle von der 

Befehlseinheit (103) zu lesen, 

Abtegen eines ersten und eines zweiten Be- 
10 fehls, der von der Befehlseinheit (103) ausge- 

lesen wurde, in wenigstens zwei Befehlsregi- 

stern (104, 105), und 

Ausfuhren paralleler arithmetischer Operatio- 
nen in Ubereinstimmung mit Befehlen, die von 
is den Befehlsregistem (104, 105) durch wenig- 

stens zwei artthmetische Einheiten (110, 112) 
ausgelesen wurden, 

Erzeugen von Information durch die Erfas- 
sungsvorrichtung (2300), die verwendet wird 
20 zum Bestimmen, ob die Belehle parallel verar- 

beitet werden konnen, 

Ablegen von Befehlen und der Information, die 
durch die Erfassungsvorrichtung (2300) er- 
zeugt wurde, in einem Cache-Speicher (2301 ), 
2S - Steuern der Abfotge der Verarbeitung der Be- 
fehle aufgrund der Informationsausgabe von 
dem Cache-Speicher (2301), und 
parallele Verarbeitung der Befehle, wenn die 
Informationsausgabe zusammen mit den Be- 
so fehlen anzeigt, daB die Befehle parallel ohne 
Konflikt verarbeitet werden konnen. 

11. Verfahren nach Anspruch 10, 

das auBerdem beinhaltet, daB die Erfassungsvor- 
3$ richtung (2300) sich zwischen Speicherschnittstelle 
(100) und Cache-Speicher (2301) befindet. 

12. Verfahren nach Anspruch 10 oder 11, 

das weiterhin beinhaltet, daB der Cache-Speicher 
40 (2301) wenigstens zwei Befehle beim selben Ma- 
schinenzyklus ausgibt, wobei fur den Fall, daB die 
Erfassungsvorrichtung (2300) einen Konflikt erfaBt, 
ein Befehl als NOP-Befehl ausgefuhrt wird. 

45 

Revendications 

1. Processeur de donnees comportant 

50 - une unite destructions (1 03) recevant des ins- 
tructions a partir d'une interface memoire (100), 
un sequenceur (102) destine a commander un 
compteur d' instructions (101 ) pour lire des ins- 
tructions a partir de I'unite destructions (103), 

55 - aumoins deux registres ^instruction (104, 105) 
destines a mdmoriser des premiere et seconde 
instructions lues a partir de I'unite destructions 
(103), et 
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au moins deux unites arithmetiques (110, 112) 
pour effectuer des operations arithmetiques en 
parallele conformement aux instructions lues a 
partirdes registres d'instruction (104, 105), 

I'unite ^instructions (103) comportant des moyens 
de detection (2300) pour produire des inlormations 
utilisees pour determiner st les instructions peuvent 
etretraitees en parallele, une antememoire (2103) 
pour memoriser les instructions et les informations 
produites par les moyens de detection (2300) et des 
moyens (2302, 2303), qui reagissent aux informa- 
tions deMivrdes en sortie par I'antememoire (2301), 
ainsi qu'aux instructions, pour commander la se- 
quence d'execution suivie par les unites arithmeti- 
ques (110, 112). 

2. Processeur de donnees selon la revendication 1 , 
comportant en outre des moyens de detection 
(2300) agences entre I'interface memoire (100) el 
I'antememoire (2301). 

3. Processeur de donnees selon la revendication 1 ou 
2, dans lequel, en outre, I'antememoire (2301 ) de- 
livre en sortie au moins deux instructions au cours 
du meme cycle machine, et dans lequel, lorsque les 
moyens de detection (2300) detectent un conflit, 
une instruction est executee en tant qu'instruction 
NOR 

4. Processeur de donnees selon au moins Tune des 
revendications 1 a 3, dans lequel, en outre, les 
moyens de detection (2300) sont concus pour n'ac- 
ceder qu'une seule fois a une memoire au cours 
d'un cycle machine et detectent (edit conflit en de- 
terminant si chacune d' entre une premiere instruc- 
tion et une instruction suivante est une instruction 
de chargement ou une instruction de memorisation. 

5. Processeur de donnees selon au moins Tune des 
revendications 1 a 4, dans lequel, en outre, les 
moyens de detection (2300) detectent le conflit sur 
la base du fait que certaines des unites arithmeti- 
ques (110, 112), agencees au moins au nombre de 
un, component un dispositif de decalage a barillet 
(1501), qu'au moins I'une d'entre ladite premiere 
instruction et ladite instruction suivante est une ins- 
truction de decalage de un bit et que I'unit6 arith- 
metique (110, 112) destinee a traiter ladite instruc- 
tion de decalage de un bit ne comporte pas de dis- 
positif de decalage a barillet (1501). 

6. Processeur de donnees selon au moins I'une des 
revendications 1 a 5, dans lequel, en outre, des 
moyens de detection (2300) component des 
moyens pour ecrire un bit de conflit indiquant ledit 
conflit au niveau de chacune des m instructions qui 
sont lues en meme temps, dans I'antememoire 



(2301). 

7. Processeur de donnees selon la revendication 6, 
dans lequel, en outre, une instruction de change- 

5 ment de drapeau d'etat et une instruction de bran- 
chement associ^e a une condition de branchement 
satisfaite sont incluses dans les m instructions lues 
par le compteur d'in struct ions (101 ) et des moyens 
sont agences pour detecter la presence d'une ins- 

io truction de branchement conditionnel a une adres- 
se pr6determinee pour supprimer l'ex6cution des 
instructions situees apres une adresse correspon- 
dante des m instructions lorsque la condition est sa- 
tisfaite. 

15 

8. Processeur de donnees selon au moins Tune des 
revendications 1 a 7 : dans lequel, en outre, une par- 
tie (2401) de la memoire dans laquelle les instruc- 
tions sont memorisees est formed d'une ROM. 

20 

9. Processeur de donnees selon au moins Pune des 
revendications 1 a 8, dans lequel. en outre, toute 
interruption est inhibee entre les m instructions qui 
sont lues en meme temps. 

25 

10. Precede pour traiter des donnees, comportant les 
etapes consistant a 

recevoirdes instructions a partir d'une interface 
30 memoire (100) pour une unite destructions 

(103), 

commander un compteur d'in struct ions (101) 
par I'interm6diaire d'un sequenceur (102) de 
maniere a lire des instructions a partir de I'unite 
35 d'in st ructions (103), 

memoriser les premiere et seconde instruc- 
tions lues a partir de I'unite destructions (103) 
dans au moins deux registres d'instruction 
(104, 105), et 

40 - effectuer des operations arithmetiques en pa- 
rallele conformement aux instructions lues a 
partir des registres d'instruction (104, 105) par 
au moins deux unites arithmetiques (110, 112), 
produire, par I 'intermedia ire de moyens de de- 

45 tection (2300), des informations utilisees pour 

determiner si les instructions peuvent etre trai- 
tees en parallele, 

memoriser les instructions et les informations 
produites par les moyens de detection (2300) 

50 dans une antememoire (2301), 

commander la sequence d'execution des ins- 
tructions sur la base des informations delivrees 
en sortie a partir de I'antememoire (2301) et 
traiter en parallele les instructions, lorsque les 

55 informations delivrees en sortie en association 

avec les instructions indiquent que les instruc- 
tions peuvent etre traitees en parallele, sans 
conflit. 
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11. Proceed selon la revendication 10, dans lequel, en 
outre, les moyens de detection (2300) sont agences 
entre ('interface memoire (100) et I'antememoire 
(2301). 

5 

12. Procede selon la revendication 10 ou 11, dans le- 
quel, en outre, Pantem6moire (2301 ) de" livre en sor- 
tie au moins deux instructions au cours du meme 
cycle machine, et dans lequel, lorsque les moyens 

de detection (2300) detectent un confltt, une ins- 10 
truction est ex6cutee en tant qu'instruction NOR 



75 



20 



25 



30 



35 



40 



45 



SO 



13 



EP 0 368 332 B1 



IT) 

O 



o 




14 



EP 0 368 332 B1 



F I G. 2 
PRIOR ART 




202 



203 



204 



206 



208 



209 



15 



EP 0 368 332 B1 



FIG. 3 
PRIOR ART 



ADD I — lF ' i 2 — l_±L! — i i — H 1 



A , 


, OF 


, EX 


i w 


D , 


, A 


, 0F 


, EX 


IF , 


, D 


, A 


, OF 



ADD I 1 F ' D ■ A , OF , EX , W ^ 

ADD , IF , D , A . OF , EX . W 



BRAcc 



JUMPED- TO- 
INSTRUCTION 



FIG. 4 
PRIOR ART 



IF , D , A , OF .FLAG , W 



GENERATE 

,. , „ , „ , _. .FLAG , 
TEST I 1 1 ' 1 f 1 

F . D ■ A , ( 

IF , D , A 



WAITING CYCLE 



INSTRUCTION 



FIG. 5 
PRIOR ART 



j ! I F , D , A , OF , E j E j , E W 



INSTRUCTION 2 | IF I D 1 A I QF |.. I E ' W I 



WAITING CYCLE' 



16 



EP 0 368 332 B1 



F IG. 6 
PRIOR ART 




17 



EP 0 368 332 B1 



I— 

< 
O 

tr 

CL 



UJ 



UJ 



cr 



UJ 



DC 



u. 



UJ 



cr 



cr 
oo < 



— O 



UJ 

I- 
< 



<r 



— CM 

I s 



o 
cr 

H 

CO 



a 
cr 

10 
2 



o 

ID 

a: 
H 
cn 



IS 



eg 



CO 
UJ 



o 
o 

< 
cr 

CD 



UJ i 



^ CO 

=>z 

-3 — 



18 



EP 0 368 332 B1 

F IG. 9 



TYPES 


MNEMONIC 


OPERATION 




ADD R(S1). R(S2). R(D) 


R(S1 )+ R(S2)-"R{D) 




SUB 


ii 


R(S1)-RIS2) — RID) 


INSTRUCTION 


AND 


it 


STORE LOGICAL PRODUCT OF EACH 
BITS OF R(S1). RIS2) IN RtD) 


OR 


ii 


STORE LOGICAL SUM OF EACH BITS 
OF R(Si), RIS2) IN RtD) 


EOR 


ii 


STORE EXCLUSIVE OR OF EACH BITS 
OF RiSii, R(SZ) IN RID) 


ASIC 


NOT R(S1). R(D) 


STORE LOGICAL NOT OF EACH BIT OF 
R(SI) IN RID) 


CD 


SFT R(S1). R(S2).R(D) 


SHIFT RCS1) BY BIT NUMBER INDICATED 
BY R(S2) AND STORE IN R(D) 










NOP 


DO NOTHING 


z 


BRA d 


PC+d — PC 


o 
o 


BRAcc d 




BRANC 
INSTR 


CALL d 


PC— RIO), PC+d — PC 


RTN d 


R(O)— PC 


JCTION 


STOR R(S1). R(S2) 


WRITE R(S1) IN MEMORY POINTED 
BY R(S2) 


LOAD 

STORE 

INSTRl 


LOAD R(S1) 


. RID) 


WRITE DATA OF MEMORY POINTED 
BY R(S1) IN RID) 



19 



EP 0 368 332 B1 

FIG. 10 

1. BASIC INSTRUCTION 

5 BITS 5 BITS 5 BITS 



OP 


F 


S 1 


S2 


D 



32 BITS 



2. BRANCH INSTRUCTION 



BRA. CALL 


OP 


d 


RTN 








BRAcc 


OP 


CC 


d 



3. LOAD STORE INSTRUCTION 



OP 


SI 


S2 


D 



FIG. II 



r ADD 
ADD 

r ADD 
ADD 

,ADD 



R 



IF 



I F 



EX 



EX 



I F 



W 



W 



EX 



EX 



R 



W 



W 



EX 



W 



ADD 



EX 



W 



20 



EP 0 368 332 B1 



FIG. I2 



r LOAD 

"add 

,STOR 
"ADD 

,LOAD 
"ADD 



ADDRESS READ 
I F , TRANSFER , CACHE 



I F 



R 



_L 



EX 



I F 



I F 



TRANSFER 
ADDRESS 
I DATA 



I F 



W 



W 



WRITE 
CACHE i 



EX 



W 



w 



TRANSFER DCA ~ ^ 
ADDRESS R£ A £ 
DATA I CACHE i 



EX 



FIG. 13 



ADD 
BRA 



I F 



IF 



INSTRUCTION 1 



INSTRUCTION 2 \- 



I F 



EX 



EX 



W 



w 



EX 



EX 



W 



W 



JUMPED-TO-INSTRUCTION 1 



JUMPED-TO- INSTRUCTION 2 



21 



EP 0 368 332 B1 




22 



EP 0 368 332 B1 




23 



EP 0 368 332 B1 



X 
UJ 



X 
UJ 



X 
UJ 



O 



m 
X 
UJ 



to 

X 
UJ 



cr 



m 
x 

UJ 



x 

UJ 



cr 



GO 



x 

UJ 



CM 
X 
UJ 



cr 



Ll. 



cr 



z 

o 



cr 

CO 



C\J 

z 
O 

H 
O 

id 
a: 

H 
CO 

z 



c\J 



CO o 
CO 3 

o £ 
£5 



O O 



CO o 



CO 
UJ 
CJ 

o 
ex 
a. 



3 

cr 
co 



24 



EP 0 368 332 B1 



FIG. 19 




FIG. 20 



L. 




600 




25 



EP 0 368 332 B1 




26 



. < 1 > 



EP 0 368 332 B1 



FIG. 22 



ADDRESS FIRST INSTRUCTION SECOND INSTRUCTION 



0 


S FT 


R(l ). 


R(2), 


R13) 


ADD 


R(4). 


R(5), 


R(6) 


2 


SFT 


R(7). 


R(8), 


R(9) 


SFT 


R(I0), 


R{ll). 


R(I2) 


4 


ADD 


R(I4). 


R(I5). 


R(I6) 


ADD 


R(I7), 


R(I8), 


R(I9) 


PC 


















0 


SFT 


R(l ). 


R(2). 


R(3) 


ADD 


R(4), 


R(5). 


R(6) 


2 


SFT 


R(7), 


R(8), 


R(9) 


NOP 








3 


SFT 


R(I0). 


R(l t), 


R(I2) 


NOP 








4 


ADD 


R(I4). 


R(I5). 


RII6) 


ADD 


R(I7), 


R(18). 


R(I9) 



FIG. 23 

ADDRESS 

0 SFT R(l). R(2). R(3) ADD R(4). R(5). R(S) 

2 SFT R(7). R(8). R(9) NOP 

4 SFT R(IO). R(ll). R(I2) NOP 

6 ADD R(I4), R(I5). R(I6) ADD R(I7). R(I8). R(I9) 



27 



EP 0 368 332 B1 



FIG. 24 



ADDRESS FIRST INSTRUCTION 

0 ADD R(l), R(2). R(3) 

2 LOAD R(3), R(I0) 

4 ADD R(5). R(2). R(3) 

PC FIRST INSTRUCTION 

0 ADD R(l ). R(2). R(3) 

2 LOAD R(3), R(IO) 

3 LOAD R(6), R(ll) 

4 ADD R(5). R(2), R(3) 



SECOND INSTRUCTION 

ADD R(4), R(5), R(6) 

LOAD R{6), R(||) 
ADD R(4), R{l). R(6) 

SECOND INSTRUCTION 
ADD R(4), R(5). R(6) 
NOP 
NOP 

ADD R(4), R(l), R(6) 



FIG. 25 



ADDRESS FIRST INSTRUCTION 

0 ADD R(l). R(2), R(3) 

2 ADD R(l ). R(5), R(8) 

4 ADD R(I2). R(I3). R(l4) 

PC FIRST INSTRUCTION 

0 ADD R(l ). R(2), R(3) 

2 ADD R(l ). R(5), R(8) 

3 ADD R(8), R9), R(I0) 

4 ADD R(I2). R{I3). R(I4) 



SECOND INSTRUCTION 

ADD R(4). R(5). R(6) 
ADD R(8), R(9), R(IO) 
ADD R(I5). R(I6), R(I7) 

SECOND INSTRUCTION 
ADD R(4), R(5). R(6) 
NOP 
NOP 

ADD R(I5), R(I6). R(I7) 



28 



EP 0 368 332 B1 

FIG. 26 



1 1 3 



2300 1 — -103 



CONFLICT 
DETECTOR 



230^ ^ 

M 



230! 



CACHE 



2302 



.2305 



"2306 



FIRST 






SECOND 


MASK 




MASK 






r 







l_. 



2303 



115 



17 



16 



FIG. 27 



CONFLICT 
BIT 


LSB OF 
PC 


FIRST 

INSTRUCTION 
SIGNAL 115 


SECOND 
INSTRUCTION 
SIGNAL 117 


0 


0 


FIRST 

INSTRUCTION 


SECOND 
INSTRUCTION 


0 


1 


NOP 


SECOND 
INSTRUCTION 


1 


0 


FIRST 

INSTRUCTION 


NOP 


1 


1 


SECOND 
INSTRUCTION 


NOP 



29 



EP 0 368 332 B1 




30 



EP 0 368 332 B1 



FIG. 29 



^II3 



-2600 



CACHE 



^2601 



2302 



FIRST 
MASK 



2602 



115 



2604 



CONFLICT 
DETECT 



2303 



SECOND 
MASK 



I I 



^117 



'MI6 



31 



EP 0 368 332 B1 



FIG. 30 



ADD F 
ADD 

BRAcc 1 
INSTRUTION 1 
INSTRUCTION 2 
INSTRUTION 3 



I F 



GENERATE 
.FLAG 



R 



IF 



IF 



EX 



I- 



I W 



I F 



W 



EX 



R 



W 



EX 



EX 



W 



W 



F IG. 31 A 
PRIOR ART 





INSTRUCTION 


1 


INSTRUCTION 


2 


INSTRUCTION 


3 


INSTRUCTION 


4 





< 
< 
< 



INTERRUPT 
ALLOWED 

INTERRUPT 
ALLOWED 

INTERRUPT 
ALLOWED 



FIG. 3IB 







INSTRUCTION 1 


INSTRUCTION 2 


INSTRUCTION 3 


INSTRUCTION 4 







< 



INTERRUPT 
ALLOWED 



32 



EP 0 368 332 B1 



O CM 
00 O 

rO ro 




o 

UjO 
H X <l 

a: I- cr h 

3 - UJ _ 
O CtCL Z 
U- < O 3 



CO 

-co 

rO 



rO 



CO 

ro 



-CM 

ro 



a: 

UJ 



u 

UJ ^ 

Z X < 
o — UJ — 

uj tr cl z 
co < o ^ 



i Si 

rO 



LO 

Si 

ro 



CD 
C\J 

ro 



£(TQ.Z 
U. < O 3 



S < tr 



2 o cr 



rO 
ro 



O 
O 
CM " 

ro 



ro 



33 



EP 0 368 332 B1 




N0I1VD01 

NiononaiSNi 

0N0D3S 
NOI1V0O1 

39vaois 
NOiionaisNi 

isau 



ro 
ro 

O* 
Li- 




no uv Don 

39vaois 

NOiionaisNi 

0NO03S 

Noiivoon 

39Vd01S 

NOiionaisNi 
isaij 




N0I1VD01 

39vaois 
NOiionaisNi 

0NO03S 

N0I1VD01 
39VU01S 

NOiionaisNi 
isau 



34 



EP 0 368 332 B1 




35 



EP 0 368 332 B1 



ro 




36 



This Page is Inserted by IFW Indexing and Scanning 
Operations and is not part of the Official Record 

BEST AVAILABLE IMAGES 

Defective images within this document are accurate representations of the original 
documents submitted by the applicant. 

Defects in the images include but are not limited to the items checked: 

□ BLACK BORDERS 

□ IMAGE CUT OFF AT TOP, BOTTOM OR SIDES 
El FADED TEXT OR DRAWING 

^ BLURRED OR ILLEGIBLE TEXT OR DRAWING 

□ SKEWED/SLANTED IMAGES 

□ COLOR OR BLACK AND WHITE PHOTOGRAPHS 

□ GRAY SCALE DOCUMENTS 

□ LINES OR MARKS ON ORIGINAL DOCUMENT 

□ REFERENCE(S) OR EXHIBIT(S) SUBMITTED ARE POOR QUALITY 

□ OTHER: 

IMAGES ARE BEST AVAILABLE COPY. 
As rescanning these documents will not correct the image 
problems checked, please do not report these problems to 
the IFW Image Problem Mailbox. 



